Cache-oblivious Algorithms Cache-oblivious Algorithms Acknowledgments
نویسندگان
چکیده
This thesis presents “cache-oblivious” algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cache-line length need to be tuned to minimize the number of cache misses. We show that the ordinary algorithms for matrix transposition, matrix multiplication, sorting, and Jacobi-style multipass filtering are not cache optimal. We present algorithms for rectangular matrix transposition, FFT, sorting, and multipass filters, which are asymptotically optimal on computers with multiple levels of caches. For a cache with size Z and cache-line length L, where Z = Ω(L2), the number of cache misses for an m n matrix transpose is Θ(1 + mn=L). The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ(1 + (n=L)(1 + logZn)). The cache complexity of computing n time steps of a Jacobi-style multipass filter on an array of size n is Θ(1 + n=L + n2=ZL). We also give an Θ(mnp)-work algorithm to multiply an m n matrix by an n p matrix that incurs Θ(m+ n+ p+ (mn+ np+mp)=L+mnp=LpZ) cache misses. We introduce an “ideal-cache” model to analyze our algorithms, and we prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels. We further prove that any optimal cache-oblivious algorithm is also optimal in the previously studied HMM and SUMHmodels. Algorithms developed for these earlier models are perforce cache-aware: their behavior varies as a function of hardware-dependent parameters which must be tuned to attain optimality. Our cache-oblivious algorithms achieve the same asymptotic optimality on all these models, but without any tuning. Thesis Supervisor: Charles E. Leiserson Title: Professor of Computer Science and Engineering
منابع مشابه
A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation
An experimental comparison of cache aware and cache oblivious static search tree algorithms is presented. Both cache aware and cache oblivious algorithms outperform classic binary search on large data sets because of their better utilization of cache memory. Cache aware algorithms with implicit pointers perform best overall, but cache oblivious algorithms do almost as well and do not have to be...
متن کاملCache-oblivious wavefront algorithms for dynamic programming problems: efficient scheduling with optimal cache performance and high parallelism
Wavefront algorithms are algorithms on grids where execution proceeds in a wavefront manner from the start to the end of the execution (execution moves through the grid as if a wavefront is moving). Many dynamic programming problems and stencil computations are wavefront algorithms. Iterative wavefront algorithms for evaluating dynamic programming (DP) recurrences exploit optimal parallelism, b...
متن کاملFunnel Heap - A Cache Oblivious Priority Queue
The cache oblivious model of computation is a two-level memory model with the assumption that the parameters of the model are unknown to the algorithms. A consequence of this assumption is that an algorithm efficient in the cache oblivious model is automatically efficient in a multi-level memory model. Arge et al. recently presented the first optimal cache oblivious priority queue, and demonstr...
متن کاملCache-aware and Cache-oblivious Algorithms
---------------------------------------------------------------------------------------------iii Table of
متن کاملCache Efficient Simple Dynamic Programming
New cache-oblivious and cache-aware algorithms for simple dynamic programming based on Valiant’s context-free language recognition algorithm are designed, implemented, analyzed, and empirically evaluated with timing studies and cache simulations. The studies show that for large inputs the cache-oblivious and cache-aware dynamic programming algorithms are significantly faster than the standard d...
متن کامل